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The rate-distortion junction with a mean square error distortion criterion 
is investigated for a class of Gaussian Markov sources. It is found that for 
rates greater than a certain minimum, the rate-distortion function is equiva- 
lent to that of an independent letter source. This minimum rate was found 
to be less than n bits per symbol, where n is the order of the Markov se- 
quence. Comparisons between the rate-distortion function, and two quantiz- 
ing systems are made. 

I. INTRODUCTION 

Suppose in the communication system of Fig. 1, the source emits a 
sequence of continuous-valued random variables. The exact specifica- 
tion of such variates requires an infinite number of binary digits. Hence 
exact transmission would require a channel of infinite capacity. Since 
no physical channels possess infinite capacity, we see that exact trans- 
mission is impossible through this system. 

However, if we are willing to accept some error in our specification 
of the source output, then finitely many binary digits are necessary. 
In the study of digital encoding systems, a useful quantity to know is 
the fewest number of binary digits necessary to represent an analog 
signal within a certain error. Such a quantity would give us a perform- 
ance criterion with which to compare existing systems, and also tell us 
how much improvement is possible. 

The quantity we seek is given by Shannon's rate-distortion function. 1 " 2 
The rate-distortion function gives, for any bit rate, the minimum pos- 
sible error achievable. 

In this paper we study the rate-distortion functions for the important 

* This research was partially supported by the Air Force Office of Scientific 
Research under Contract AF 49(638)-1600. This paper is part of a dissertation 
submitted in 1969 to the Faculty of the Polytechnic Institute of Brooklyn, in partial 
fulfillment of the requirements for the Ph.D. degree in systems science. 
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Fig. 1 — General communication system. 

class of gaussian Markov sources. We measure our error by the mean 
square error criterion. Also, the performance of two quantizing systems, 
differential PCM and block quantizing, is compared to the rate-distor- 
tion bound. 

II. DISCUSSION OF RESULTS 

We have studied the rate-distortion functions of gaussian Markov 
sources with a mean square error criterion. We express our results in 
Fig. 2 by plotting signal-to-noise ratio in dB, versus bit rate R. The 
signal-to-noise ratio is given by 



S/N = 10 log 10 15 



(1) 



where <r~ is the variance of the source output, and D is the mean square 



error. 



It was found that for rates R greater than a certain R mia , the rate 
distortion function is given by 



R = h iog 2 c -§ ^ D £*\ 



(2) 
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Fig. 2 — Rate-distortion bound of a Markov-re source compared with block quantiz- 
ing system and differential PCM. 
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or 



S/N = 6.0272 + 10 log 1£ 



(3) 



where a„ is the minimum mean square prediction error one step ahead. 
The point R mia occurs in the interval (0, n) where n is the order of the 
Markov process that the source emits. The exact location of R mia de- 
pends on the exact shape of the power spectral density of the process, 
as we shall see. At R = R miu , the rate-distortion function has a dis- 
continuity in the third derivative. 

If the source were followed by the optimum prediction system of Fig. 
3 then the output sequence produced would be uncorrelated with vari- 
ance o^ . Such a sequence has the rate-distortion function given by (2). 
Hence for rates greater than R miD the sequences at the input and output 
of the prediction system have equal rate-distortion functions. For rates 
less than R mia they do not. 

A lower bound on the performance achievable by the block quantizing 
system of Fig. 4 was found. The result is also shown in Fig. 2, where it 
is seen that this system can be made to perform within 4.34 dB of the 
bound. 

Also shown in Fig. 2 is the performance bound for a differential PCM 
system (see Fig. 5) as derived by O'Neal. This bound however, holds 
only for high bit rates. 

III. RATE DISTORTION FUNCTIONS FOR MARKOV-N SOURCES 

3.1 Introduction 

Consider again the communication system of Fig. 1. The source emits 
the discrete time, stationary random process x, , t = 0, ±1, ±2, • • • . 
After iV seconds, a column A r vector A' is obtained, and after encoding, 
transmission and decoding, the receiver obtains a replica X of A'. The 
mean square error between the transmitted and received vectors is 
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Fig. o — Predictive communication system. 
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Fig. 4 — Block quantizer for correlated source. 



D = ±E(X-X) T (X-X) 



(4) 



where E denotes expectation and X T is the transpose of X. It is reason- 
able to ask what the minimum bit rate is, at which we must transmit, 
so as to be able to achieve a mean square error less than some prescribed 
amount. The answer is given by Shannon's rate-distortion function 
which is defined as follows: 



R(D) = lim min -^ J J p(X N )p(X N \ X N ) 



'\og 2 p{Jt "jX N) dX N dX N 
p(X N ) 

where the minimization is taken over all p(X N | X N ) satisfying 

(D) - ^ // (X N - X N ) T (X N - X N ) 

■p(X N )p(X N I X N ) dX N dX N < D 
and where 



(5) 



(6) 



p(X N ) = probability measure of the source vector X N 
p(X N | X N ) = conditional probability measure of X N given X N 

p{X N ) = probabihty measure induced on X N by p(X N ) and 
p{X N | X N ). 
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Fig. 5 — Differential pulse code modulation system. 
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(The subscript N is included to emphasize that we are dealing with an 
iV-vector.) 

Suppose the source emits a stationary gaussian time series with cor- 
relations E(XjX k ) = r,- fc = r T . Then the discrete time power spectral 
density is given by 



/(A) = E r* M 



-T < X < 7T 



(7) 



and the rate distortion function is given parametrically by 3 (see Fig. 6 
for interpretation) 



and 



A = {X :/(X) ^0} 
A' = |X:/(X) <*} 

4U4'= (-7T,7T). 



8(a) 
8(b) 



Hence, if we are given a distortion D, from (8b) we can find <f>, and 
then from (8a) we can find the theoretically minimum rate R necessary 
to achieve a mean square error less than or equal to D. If {x t ) consists 




Fig. 6 — Graphical interpretation of equations 8a and b. The set A = ( — ir, X_ 4 ) 
U (X_ 3 , X_ 2 ) U (X_,, X,) U (Xt, X,) U (X 4 , w). A' = (X_ 4 , X_,) \J (X_j, X_,) W (X,, X 2 ) 
W (X„ X 4 ). 
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of independent Gaussian variates, with variance a 2 , then j{\) = a 2 and 
(8a) becomes 



R(D) = | log 2 j: bits/symbol, 



(9) 



If we restrict the class of sources to be wide sense Markov of order 
n, then /(\) assumes the following form: 

/(X) = -= (10) 

with < a { < 1, a t t* a k if ; ^ k, and K is chosen to satisfy 

<r 2 = E{x 2 T ] =~- T /(X)dX. (11) 

In the remainder of this paper we consider some properties of the 
rate distortion function as given by (8a) and (8b) for processes with 
power spectral density (10).* 

3.2 The Markov-n Sequence 

In this section we present some results from prediction theory. 
For details and proofs see Refs. 6 and 7. 

A process with power spectral density given in (10) is known as a 
Markov-n process. 7 Performing the indicated multiplication in (10) 
results in 

m - -= = , 5 T h ,ic-m T ^TT7> ■ (12) 

II ! e iX - a,- | 2 I e + b,e +•••+&„[ 

y=i 

A sequence with the spectrum (12) can be shown to satisfy the autore- 
gressive relation 

x n + E &,X»-i = e n (13) 

where { e„ } is a sequence of uncorrelated random variables with variance 
K. 

Writing (13) in the form 

x n = - E b<z^i + e„ (14) 

i=i 

* T. Berger, in a recent paper considers similar properties for the Weiner process 4 . 
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it can be shown by the orthogonality principle (Ref. S, Section VII-C) 
that the best linear predictor in the mean square sense, of x n given the 
infinite past is just 

In = ~ E ba-n-i ■ (15) 

t = l 

Hence for a Markov-n process the best prediction involves only the 
n previous samples. 
The error is 

e = x n — x n = e„ . (16) 

The minimum mean square error is thus 

a~ m E(e n ) 2 = K. (17) 

From (10) and (17) 

5; f log, m d\ = log 2 al - i- £ (' log 2 I s a - a, | 2 . (18) 

From Peirce's tables, number 540, it can be shown that the integral is 
zero (recalling that < a, < 1). We state our conclusion as a theorem. 

Theorem 1: For a sequence with spectrum given in (10) the minimum 
mean square error resulting from an optimal prediction one step ahead is 
a 2 m , where 

log 2 *l =5- / log 2 i(X)d\. (19) 

-7T J_ T 

Theorem 1 is a special case uf the theorem proved in Ref. 6, page 1S3. 

3.3 Evaluation of R(D) for D ^ /(*■) 

We next consider the particular form that equations (8a) and (Sb) 
assume when /(A) is as given in (10). 

Theorem 2: Given a process with 

K 



f(A) = 



n 



e — a. 



for some integer n. For mean square errors satisfying ^ D ^ f(7r), R(D) 
is given by 

R(D) = § log-, yj bits/symbol. (20) 
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Prooj: From (8a) and (8b) 



D = ±- [ 4>d\ + f /(A) d\. 

4TT J A J A' 



The power spectral density /(A) is monotonically decreasing with a 
minimum at X = tt. Hence for <j> in the range ^ <f> ^ /(tt) , A = ( - v, r) , 
A' = 0, and 



D 



-s/"**-* 



It follows that 



BC») = fl(Z» = | /' log 2 /(X) ^ - i log 2 D. 



(21) 



(22) 



From Theorem 1 the first term is \ log v* so fl(D) = | log 2 <r^/Z) 
which holds for < D ^ /(*■). This is (20). 

The rate-distortion function (20) is precisely the rate-distortion func- 
tion of a process consisting of independent gaussian random variables 
with mean and variance a I [see (9)]. 

Figure 7 illustrates why the rate-distortion function depends on 
/(tt) in this way. The shape of the spectrum of D in (8b) is that which 
would be assumed by water if it were poured into a container shaped as 
/(X). As we pour in water, it distributes itself uniformly so long as its 
level is below /(tt). Hence D is independent of /(A) so long as D < /Or). 
Once D = /(t) the exact shape of /(A) comes into play. 

Consider next the predictive communication system of Fig. 4. The 
source emits the gaussian process with power spectral density (10). The 




Fig. 7 — Typical Markov spectrum, illustrating water filling interpretation of the 

rate-distortion function. 
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optimum predictor makes a prediction of x n based on {x t }"lj . This 
prediction is then subtracted from x n and the error is transmitted. The 
transmitted sequence is thus the sequence {e„} [see (14)] which is a 
sequence of uncorrelated gaussian random variables with variance a^ . 
Its rate-distortion function is thus also given by (20), for D in the in- 
terval < D ^ al . 
From (1) 

S/N - 10 log,. ~ 
= 10 log 10 & 



<tLD 



= 3.01 log 2 ^f+10 log 10 — 2 

U (T m 

2 

= 6.02R+ 10 1og 10 ^2 (23) 

since R is given by (20). Hence S/N is a linear function of R over the 
range of R for which ^ D ^ /(*■). This range depends on n, the order 
of the Markov process, as given in theorem 3. 

Theorem 3: For an nth order gaussian Markov process, the rate-distortion 
function is given by 

R(D) = £ log 2 -~ bits/symbol 

for rates R ^ R min . The value of R min depends on the exact shape of the 
power spectral density f(X) and assumes a value satisfying 

< R min < n bits/ symbol (24) 

depending on the n/s of f(X) [see (10)]. 

Proof: From (10) 

K 



/<X) = - 

n I c" - a, 
i-i 

From this 



/to = (25) 

II I 1 + a, | 2 
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At D - /(*■) 

Bmin = B(fW) = h log 2 fe bits/symbol (26) 

which from Theorem 1 is 

= |[^/' log 9 /(X)<*X- log 2 /(r) 

= | [log, K-±-t,r l«g 2 I «* - «, I 2 ^ 

- lo g2 7v + Elog 2 (l +fl,) 1 |- (27) 

As in (18) the integral is zero and 

R^ = E l«g2 (1 + a,) bits/symbol. (28) 

»=i 

Since | a, | < 1, R min < n bits/symbol. Hence, < R miD < n bits/ 
symbol, which is the desired result. 

3.4 Behavior of R(D) at D = /(*-) 

With /(X) as given in (10), the rate-distortion function is, from (20) 

2 

R(D) = \ log 2 j* 
for < D ^ /(*-), and from (8a) and (8b) 

"w-sf 1 *^* (29a) 

/>w - - r r /( x ) ^ + [ r /^ ** (29b) 

for /(x) ^ D ^ a 2 . Writing (8a) and (Sb) in this form follows from the 
observation that for a monotonically decreasing power spectral density 
the set A equals the simply connected interval (0, X) and <£ = /(X), for 
the appropriate X. 
From (20) 



(ID 
and from (29) 



g| = (_!)- ^-=^- ! D- In 2 < D < /« (30) 
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dR 1 1 i o 

dD = 2W) ln2 (31) 

d R 7T 1 . _ ,„„. 

d^ = 2Xf00 ln2 < 32 > 

d 3 R__^ f (X) + 2X/(X)f(X) 

dZ> 3 2 X :1 f(X)/'(X) "^ ^ 

for /(*■) < D < <r 2 , where f(X) = d/(X)/d\ 

From (30), (31), and (32) we see that dR/dD and d 2 R/dD 2 are con- 
tinuous at D = f(j). But from (33) we see that d 3 R/dD 3 -* - « asD -» 
/(tt) from above (since /'(*■) — > 0), whereas d 3 R/dD 3 is bounded as Z) — ► 
f(ir) from below. Hence d*R/dD* is discontinuous at Z> = /(x). 

IV. QUANTIZING CORRELATED SOURCES 

4.1 Introduction 

Consider a source that emits a sequence of independent gaussian 
random variables of mean 0, variance a- 2 . It is desired to optimally quan- 
tize the source by using an M level quantizer. Max 10 has shown that by 
optimally choosing the quantizer input ranges and output levels, a mean 
square quantization error of 

D q = K(M) |p (34) 

can be achieved where K(M) is a function of M. Further, it is shown 
numerically that K(M) ^ 2.72, and that the inequality becomes an 
equality as M — > <x> . Hence for any M 

D, ^ 2.72 |p (35) 

For an .1/ level quantizer the number of bits/symbol is R = log 2 M, so 
that (35) can be written 

0.^2.72^. (36) 

The rate-distortion function of the process is from (9) 

R = I iog 2 ^ 

so that the minimum possible mean square error achievable with a fixed 
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bit rate R is 



D miB ■ — • (37) 



Hence Max's scheme can be made to achieve a mean square error 
satisfying 

D q ^ 2.72 D mia (38) 

where D miD is the minimum mean square error as given by rate-distortion 

theory. 

In this section we find a bound on a quantizing system studied by 
Huang and Schultheiss. 11 Our result is that (38) holds also for correlated 
sources, when D min is as given by the appropriate rate-distortion fun- 
tion. For the case of Markov sources we plot this result in Fig. 2. 

4.2 Description of the System 

Referring to Fig. 4, the source emits correlated gaussian variates (not 
necessarily Markov), of mean and with correlation matrix (R = 
E(XX T ). The operator A accumulates source iV-vectors X, and rotates 
them in such a way that 

Y = AX (39) 

and 

E{YY T ) = E{AXX T A T ) = AE(XX T )A T = A(RA T = J (40) 

where J is a diagonal matrix whose ith entry is X, , the ith eigenvalue 
of (R. Hence Y is an iV-vector whose components are independent ran- 
dom variables with mean and variance X,- , and A is a unitary trans- 
formation. 

The sequence of independent variates \y t ) (the components of Fat) 
are then quantized step by step. 1011 The jth quantization can be opti- 
mized to produce a mean square error of 

Di = KiM^Mj 2 < 2.72\,MJ' (41) 

where M f is the number of quantization levels used to quantize y, . 
Denoting the output of the quantizer by the vector Y', the average 
mean square error is 

D = ±E(Y - Y') r (Y - Y') =jjE(Y- Y') T A T A(Y - Y') 

= jrE(X- X') T {X - X') (42) 
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where we have used the fact that for a unitary transformation A A T = 
A A' 1 = I, the identity matrix. Hence the system mean square error 
equals the quantizer mean square error. 
From (41) and (42) 

D = ±E(Y - Y')\Y - Y>) = ±E £ ( Vi - ytf 

^^2.72 Ex,il/7 2 ■ D u . (43) 

4.3 Optimization over the M, 

We next tighten the upper bound by optimally choosing the M ,'s 
subject to the following constraints. 

(i) M{ ^ 1 for every j. The quantizer must have at least one output 
level. 

(w) The bit rate is limited by the channel capacity, C bits per symbol. 
We can thus use M — 2° levels per symbol or M N levels per vector. 
This implies the constraint 

M N = n M< . (44) 

Hence we wish to minimize the right side of (43) subject to (44), while 
keeping in mind constraint (i). 

With v a Lagrange multiplier, we form 

F - D u + vM N . (45) 

A differentiation with respect to M k yields 

where n is a constant. Using (44) to solve for the constant gives 

1/ / 1 , ..\ i /y ' (47) 



L nA, 

and 



M 
However constraint (i) will only hold if in (47) 



D. -finM • (48) 
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N \1/N 



k a *=W~ (49) 



for every k. 

The right side of (49) can be written 



N \i/N - \r , N 

V low, X. 



nv 



1 AT 
7 
2 



M 2 M a 



ii/ 2 



cr„. 



(50) 



(51) 



(52) 



where we have used the fact that the eigenvalues of (R approach the 
ordinates of /(\) equally spaced in ( — jt, t) as N — > » (see Ref. 6), and 
then applied the definition of a Riemann integral. Finally, we used 
(19). Hence the constraint (i) is met if 



^ i jp <S3> 



for all k. Using (50), (51), and (52), (48) becomes 



D u = 2.72Jj- 2 - (54) 



In terms of signal to noise ratio we get 

2 2 

S/N = 10 log 10 jj ^ 10 log 10 ^ 

= 10 log 10 ^2 + 20 log 10 2 log 2 M - 4.34 

- 10 log 10 ^ + 6.02ft - 4.34 (55) 

for 

R > h log 2 ^ . (56) 

and where we used the relation 

R = log 2 M. (57) 

Suppose, however, that for some X fc 's (53) is not met. Specifically, 
arrange the eigenvalues such that X x ^ X 2 ^ X 3 • • • ^ X^ and suppose 
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that (47) yields 












M k ^ 1 


k = 1,2, •• 


• J 


(58a) 




M k < 1 


k = J + 1 • 


... 2V. 


(58b) 



Set those il/ t in (58b) equal to one, and reoptimize over the M k of 
(58a), the expression 

D ' = 2 - n tik (59) 

subject to the constraint 

II M k = M\ (60) 

We would find that optimally 



^ = ^7T-=T »-l...J (61) 

where the right side of (61) is a constant. Without loss of generality, 
we can assume that all M k obtained from (61) are greater than or equal 
to one. Otherwise we would set the infeasible M k equal to one, and 
reoptimize. The procedure would return us to an equation similar to 
(61). As N -> oo 

= 2.72^(£y+ ± X.) 

~ = 2 - 72 ttj> X + ^i> )d \] < 62 > 

where A and A' are as given in (8) with <f> replaced by y. 
Similarly 



i-i 
M 

which, upon rearrangement, becomes 



y = -izzrj (63) 



1 J X 

R = log 2 M = ^ L log 2 -f 



= i;j>v rfx - «"> 



3074 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1969 

By comparing (8a) and (8b) with (62) and (64) we see that (62) has 
the optimal spectrum for a rate given by (64). This implies that our 
procedure of setting infeasible M k 's equal to one does indeed lead to an 
optimum result. 

Further, the terms in brackets in (62) is the minimum mean square 
error for a rate given by (64). Hence the quantization procedure has 
yielded 

D q ^ 2.72 D min 

which is (38). 

This result is plotted in dB in Fig. 2, for the case of a Markov-n 

process. 

There is an approximation involved in obtaining this result. The Af , 
obtained may not be integers. However, the large M , will be little 
affected by rounding, and the looseness of the bound of (38) for small M, 
counteracts the effects of rounding the small M < . In fact, for very small 
Mi the bound is conservative, as we can see from Fig. 2. Clearly S/N 
should approach zero as R goes to zero. Hence our lower bound on S/N 
is loose in this range. 
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